The three datasets that will be used are:
The original and reformatted CSV’s can be found in 01 Data folder.
The ETL R scripts are Refugees_Combined_ETL.r and Religion-WRP ETL.r, both of which can be found in 01 Data folder.
Three tables are created in Oracle database:
CREATE TABLE Refugee_Stats ( Asylum_Country varchar2(4000), Origin_Country varchar2(4000), Record_Year number(38,4), Refugees number(38,4), Asylum_seekers number(38,4), Returned_Refugees number(38,4), IDPs number(38,4), Returned_IDPs number(38,4), Stateless_persons number(38,4), Others_of_concern number(38,4), Total_Population number(38,4) );
TABLE “MILITARIZED_DISPUTES”
STATE_ABBR VARCHAR2(4000 BYTE) Yes 1
DISPNUM3 NUMBER(38,4) Yes 2
DISPNUM4 NUMBER(38,4) Yes 3
COUNTRY_CODE NUMBER(38,4) Yes 4
START_DAY NUMBER(38,4) Yes 5
START_MON NUMBER(38,4) Yes 6
START_YEAR NUMBER(38,4) Yes 7
END_DAY NUMBER(38,4) Yes 8
END_MON NUMBER(38,4) Yes 9
END_YEAR NUMBER(38,4) Yes 10
SIDE_A_BOOL NUMBER(1,0) Yes 11
REVISIONIST_STATE_BOOL NUMBER(1,0) Yes 12
REVISION_TYPE2 NUMBER(38,4) Yes 13
REVISION_TYPE1 NUMBER(38,4) Yes 14
FATALITY_LEVEL NUMBER(38,4) Yes 15
FATALITY_PRECISE NUMBER(38,4) Yes 16
HIGHEST_ACTION NUMBER(38,4) Yes 17
HOSTILITY_LEVEL NUMBER(38,4) Yes 18
ORIGINATOR_BOOL NUMBER(1,0) Yes 19
TABLE “RELIGIONS_BY_NATION”
STATE VARCHAR2(4000 BYTE) Yes 1
NAME VARCHAR2(4000 BYTE) Yes 2
DATATYPE VARCHAR2(4000 BYTE) Yes 3
SOURCERELIAB VARCHAR2(4000 BYTE) Yes 4
RECRELIAB VARCHAR2(4000 BYTE) Yes 5
RELIABILEVEL VARCHAR2(4000 BYTE) Yes 6
VERSION VARCHAR2(4000 BYTE) Yes 7
SOURCECODE VARCHAR2(4000 BYTE) Yes 8
YEAR NUMBER(38,4) Yes 9
CHRSTPROT NUMBER(38,4) Yes 10
CHRSTCAT NUMBER(38,4) Yes 11
CHRSTORTH NUMBER(38,4) Yes 12
CHRSTANG NUMBER(38,4) Yes 13
CHRSTOTHR NUMBER(38,4) Yes 14
CHRSTGEN NUMBER(38,4) Yes 15
JUDORTH NUMBER(38,4) Yes 16
JDCONS NUMBER(38,4) Yes 17
JUDREF NUMBER(38,4) Yes 18
JUDOTHR NUMBER(38,4) Yes 19
JUDGEN NUMBER(38,4) Yes 20
ISLMSUN NUMBER(38,4) Yes 21
ISLMSHI NUMBER(38,4) Yes 22
ISLMIBD NUMBER(38,4) Yes 23
ISLMNAT NUMBER(38,4) Yes 24
ISLMALW NUMBER(38,4) Yes 25
ISLMAHM NUMBER(38,4) Yes 26
ISLMOTHR NUMBER(38,4) Yes 27
ISLMGEN NUMBER(38,4) Yes 28
BUDMAH NUMBER(38,4) Yes 29
BUDTHR NUMBER(38,4) Yes 30
BUDOTHR NUMBER(38,4) Yes 31
BUDGEN NUMBER(38,4) Yes 32
ZOROGEN NUMBER(38,4) Yes 33
HINDGEN NUMBER(38,4) Yes 34
SIKHGEN NUMBER(38,4) Yes 35
SHNTGEN NUMBER(38,4) Yes 36
BAHGEN NUMBER(38,4) Yes 37
TAOGEN NUMBER(38,4) Yes 38
JAINGEN NUMBER(38,4) Yes 39
CONFGEN NUMBER(38,4) Yes 40
SYNCGEN NUMBER(38,4) Yes 41
ANMGEN NUMBER(38,4) Yes 42
NONRELIG NUMBER(38,4) Yes 43
OTHRGEN NUMBER(38,4) Yes 44
SUMRELIG NUMBER(38,4) Yes 45
POP NUMBER(38,4) Yes 46
CHRSTPROTPCT NUMBER(38,4) Yes 47
CHRSTCATPCT NUMBER(38,4) Yes 48
CHRSTORTHPCT NUMBER(38,4) Yes 49
CHRSTANGPCT NUMBER(38,4) Yes 50
CHRSTOTHRPCT NUMBER(38,4) Yes 51
CHRSTGENPCT NUMBER(38,4) Yes 52
JUDORTHPCT NUMBER(38,4) Yes 53
JUDCONSPCT NUMBER(38,4) Yes 54
JUDREFPCT NUMBER(38,4) Yes 55
JUDOTHRPCT NUMBER(38,4) Yes 56
JUDGENPCT NUMBER(38,4) Yes 57
ISLMSUNPCT NUMBER(38,4) Yes 58
ISLMSHIPCT NUMBER(38,4) Yes 59
ISLMIBDPCT NUMBER(38,4) Yes 60
ISLMNATPCT NUMBER(38,4) Yes 61
ISLMALWPCT NUMBER(38,4) Yes 62
ISLMAHMPCT NUMBER(38,4) Yes 63
ISLMOTHRPCT NUMBER(38,4) Yes 64
ISLMGENPCT NUMBER(38,4) Yes 65
BUDMAHPCT NUMBER(38,4) Yes 66
BUDTHRPCT NUMBER(38,4) Yes 67
BUDOTHRPCT NUMBER(38,4) Yes 68
BUDGENPCT NUMBER(38,4) Yes 69
ZOROGENPCT NUMBER(38,4) Yes 70
HINDGENPCT NUMBER(38,4) Yes 71
SIKHGENPCT NUMBER(38,4) Yes 72
SHNTGENPCT NUMBER(38,4) Yes 73
BAHGENPCT NUMBER(38,4) Yes 74
TAOGENPCT NUMBER(38,4) Yes 75
JAINGENPCT NUMBER(38,4) Yes 76
CONFGENPCT NUMBER(38,4) Yes 77
SYNCGENPCT NUMBER(38,4) Yes 78
ANMGENPCT NUMBER(38,4) Yes 79
NONRELIGPCT NUMBER(38,4) Yes 80
OTHRGENPCT NUMBER(38,4) Yes 81
SUMRELIGPCT NUMBER(38,4) Yes 82
TOTAL NUMBER(38,4) Yes 83
DUALRELIG NUMBER(38,4) Yes 84
For Religions by Nation dataset:
Interesting Findings:
China vs Japan: Even though China and Japan has a striking difference in total population, for the early period (1945 – 1970), their Buddhist population was comparable (with Japan’s being little higher). However, from 1970 onwards, Buddhist population in China increased markedly and steadily, outstripping Japan completely.
The interesting point about China’s increase is that even though it was rapid, there was a sudden drop (by no less than 50%) between 1995 and 2000, followed by a sudden comeback and plateau right in 2005.
For Vietnam, after long period of slow but steady increase in Buddhist population between 1955-1995, there appears a sharp increase and plateau in 2000 (from 6.5M to 40M between 1995 and 2000).
For Religions by Nation dataset:
Interesting Findings:
The behavior of nonreligious percentage in Czechoslovakia, Finland, and USA are considered normal. Thus, they are showned just for comparison.
The interesting behaviors are in countries that have an abrupt change in trend, or a sudden increase/decrease in nonreligious percentage of at least 10%.
Russia: sudden drop from 63% in 1995 to 12% in 2000.
North Korea: 79% in 1995 to 64% in 2000.
Jamaica: downward slope from 1960-1980, but a sharp turn upwards to 1990 and then gradual decrease.
Cuba: sharp increase from 0% in 1965 to 25% in 1970.
China: sharp increase in period 1945-1960.
For Militarized Disputes dataset:
Comment: Result shows highest count of dispute length to be for 1 year, and the count decreases significantly as dispute length increases, which is expected and not interesting.
From this you can see that while populations often split comfortably across Protestant or Catholic and others, they either ascribe to one the major Buddhist and Islamic sects entirely or not at all.
When I tried paginating these results by year, something very suprising came to light:
Where’d all the ‘other’s go? It would appear from this graph that between 1995 and 2000 an astonishing number of ’other’ (unafilliated or less-represented denominations) Muslims became either Sunni or Shiite – and mostly Sunni, by a large margin.
I decided to delve further, and look at each country individually. There, a pattern emerged, as you can see below:
This is not a universal pattern, but you’ll see that more than 40 countries show a near-total reorganization of their Muslim population into Sunni/Shiite between 1995 and 2000.
And if that’s not odd enough, 6 Buddhist countries show the same reorganization:
Nope! There’s a moderate geographic correlation with the Buddhist shift, but with two major outliers (33% of the shift), and there’s no regional correlation at all with the Islamic shift.
I found myself trawling Wikipedia and world news articles for major events in the Muslim and Buddhist worlds, but that quickly delved to the deepest reaches of conjecture. All we can say for sure from this data is: something massively significant happened in these two religious communities in the last half of the 90’s.
Started with the Military Disputes dataset.
Wanted to take a look at the military disputes data. A couple key terms…
Made a crosstab to look at the different disputes.
- To get country names, blended data with a dataset that connected country abbreviations to their full name.
The military dispute dataset.
Okay, so can we correlate this data to the refugee movement out of these countries to see if we find any trends?
-The Refugee dataset.
Looking for trends… Method:
- Inner join on our refugee data and the military dispute data.
- Cross tab, using the KPI of percentage of a country’s population that moved out of country as refugees in the same year as a dispute.
So still not many trends, but the data China is interesting. A novice mistake would be to say that these conflicts created over a million refugees. BUT, we must remember that this is refugees in the same year as the conflict, not due to the conflict itself!
As it turns out, some of these conflicts that are rated as “Wars”, actually only lasted one day! But, its still interesting that China had over a million refugees that year…
So lets look at China, decade by decade…
Method:
- Filter by years and State name.
- Include a term to track the number of conflicts through the decade.
=======